AAAI.2020 - Human-AI Collaboration | Cool Papers

#1 A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection [PDF] [Copy] [Kimi]

Authors: Akansha Bhardwaj ; Jie Yang ; Philippe Cudré-Mauroux

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.

#2 Just Ask: An Interactive Learning Framework for Vision and Language Navigation [PDF] [Copy] [Kimi]

Authors: Ta-Chung Chi ; Minmin Shen ; Mihail Eric ; Seokhwan Kim ; Dilek Hakkani-tur

In the vision and language navigation task (Anderson et al. 2018), the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users' help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques.

#3 Asymptotically Unambitious Artificial General Intelligence [PDF] [Copy] [Kimi]

Authors: Michael Cohen ; Badri Vellambi ; Marcus Hutter

General intelligence, the ability to solve arbitrary solvable problems, is supposed by many to be artificially constructible. Narrow intelligence, the ability to solve a given particularly difficult problem, has seen impressive recent development. Notable examples include self-driving cars, Go engines, image classifiers, and translators. Artificial General Intelligence (AGI) presents dangers that narrow intelligence does not: if something smarter than us across every domain were indifferent to our concerns, it would be an existential threat to humanity, just as we threaten many species despite no ill will. Even the theory of how to maintain the alignment of an AGI's goals with our own has proven highly elusive. We present the first algorithm we are aware of for asymptotically unambitious AGI, where “unambitiousness” includes not seeking arbitrary power. Thus, we identify an exception to the Instrumental Convergence Thesis, which is roughly that by default, an AGI would seek power, including over us.

#4 A Framework for Engineering Human/Agent Teaming Systems [PDF] [Copy] [Kimi]

Authors: Rick Evertsz ; John Thangarajah

The increasing capabilities of autonomous systems offer the potential for more effective teaming with humans. Effective human/agent teaming is facilitated by a mutual understanding of the team objective and how that objective is decomposed into team roles. This paper presents a framework for engineering human/agent teams that delineates the key human/agent teaming components, using TDF-T diagrams to design the agents/teams and then present contextualised team cognition to the human team members at runtime. Our hypothesis is that this facilitates effective human/agent teaming by enhancing the human's understanding of their role in the team and their coordination requirements. To evaluate this hypothesis we conducted a study with human participants using our user interface for the StarCraft strategy game, which presents pertinent, instantiated TDF-T diagrams to the human at runtime. The performance of human participants in the study indicates that their ability to work in concert with the non-player characters in the game is significantly enhanced by the timely presentation of a diagrammatic representation of team cognition.

#5 What Is It You Really Want of Me? Generalized Reward Learning with Biased Beliefs about Domain Dynamics [PDF] [Copy] [Kimi]

Authors: Ze Gong ; Yu Zhang

Reward learning as a method for inferring human intent and preferences has been studied extensively. Prior approaches make an implicit assumption that the human maintains a correct belief about the robot's domain dynamics. However, this may not always hold since the human's belief may be biased, which can ultimately lead to a misguided estimation of the human's intent and preferences, which is often derived from human feedback on the robot's behaviors. In this paper, we remove this restrictive assumption by considering that the human may have an inaccurate understanding of the robot. We propose a method called Generalized Reward Learning with biased beliefs about domain dynamics (GeReL) to infer both the reward function and human's belief about the robot in a Bayesian setting based on human ratings. Due to the complex forms of the posteriors, we formulate it as a variational inference problem to infer the posteriors of the parameters that govern the reward function and human's belief about the robot simultaneously. We evaluate our method in a simulated domain and with a user study where the user has a bias based on the robot's appearances. The results show that our method can recover the true human preferences while subject to such biased beliefs, in contrast to prior approaches that could have misinterpreted them completely.

#6 Explainable Reinforcement Learning through a Causal Lens [PDF] [Copy] [Kimi]

Authors: Prashan Madumal ; Tim Miller ; Liz Sonenberg ; Frank Vetere

Prominent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen by referring to counterfactuals — things that did not happen. In this paper, we use causal models to derive causal explanations of the behaviour of model-free reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We computationally evaluate the model in 6 domains and measure performance and task prediction accuracy. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigate: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.

#7 Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks [PDF] [Copy] [Kimi]

Authors: Woo-Jeoung Nam ; Shir Gur ; Jaesik Choi ; Lior Wolf ; Seong-Whan Lee

As Deep Neural Networks (DNNs) have demonstrated superhuman performance in a variety of fields, there is an increasing interest in understanding the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions according to the relative influence between the layers. The relevance of each neuron is identified with respect to its degree of contribution, separated into positive and negative, while preserving the conservation rule. Considering the relevance assigned to neurons in terms of relative priority, RAP allows each neuron to be assigned with a bi-polar importance score concerning the output: from highly relevant to highly irrelevant. Therefore, our method makes it possible to interpret DNNs with much clearer and attentive visualizations of the separated attributions than the conventional explaining methods. To verify that the attributions propagated by RAP correctly account for each meaning, we utilize the evaluation metrics: (i) Outside-inside relevance ratio, (ii) Segmentation mIOU and (iii) Region perturbation. In all experiments and metrics, we present a sizable gap in comparison to the existing literature.

#8 Human-Machine Collaboration for Fast Land Cover Mapping [PDF] [Copy] [Kimi]

Authors: Caleb Robinson ; Anthony Ortiz ; Kolya Malkin ; Blake Elias ; Andi Peng ; Dan Morris ; Bistra Dilkina ; Nebojsa Jojic

We propose incorporating human labelers in a model fine-tuning system that provides immediate user feedback. In our framework, human labelers can interactively query model predictions on unlabeled data, choose which data to label, and see the resulting effect on the model's predictions. This bi-directional feedback loop allows humans to learn how the model responds to new data. We implement this framework for fine-tuning high-resolution land cover segmentation models and compare human-selected points to points selected using standard active learning methods. Specifically, we fine-tune a deep neural network – trained to segment high-resolution aerial imagery into different land cover classes in Maryland, USA – to a new spatial area in New York, USA using both our human-in-the-loop method and traditional active learning methods. The tight loop in our proposed system turns the algorithm and the human operator into a hybrid system that can produce land cover maps of large areas more efficiently than the traditional workflows. Our framework has applications in machine learning settings where there is a practically limitless supply of unlabeled data, of which only a small fraction can feasibly be labeled through human efforts, such as geospatial and medical image-based applications.

#9 Expectation-Aware Planning: A Unifying Framework for Synthesizing and Executing Self-Explaining Plans for Human-Aware Planning [PDF] [Copy] [Kimi]

Authors: Sarath Sreedharan ; Tathagata Chakraborti ; Christian Muise ; Subbarao Kambhampati

In this work, we present a new planning formalism called Expectation-Aware planning for decision making with humans in the loop where the human's expectations about an agent may differ from the agent's own model. We show how this formulation allows agents to not only leverage existing strategies for handling model differences like explanations (Chakraborti et al. 2017) and explicability (Kulkarni et al. 2019), but can also exhibit novel behaviors that are generated through the combination of these different strategies. Our formulation also reveals a deep connection to existing approaches in epistemic planning. Specifically, we show how we can leverage classical planning compilations for epistemic planning to solve Expectation-Aware planning problems. To the best of our knowledge, the proposed formulation is the first complete solution to planning with diverging user expectations that is amenable to a classical planning compilation while successfully combining previous works on explanation and explicability. We empirically show how our approach provides a computational advantage over our earlier approaches that rely on search in the space of models.

#10 Corpus-Level End-to-End Exploration for Interactive Systems [PDF] [Copy] [Kimi]

Authors: Zhiwen Tang ; Grace Hui Yang

A core interest in building Artificial Intelligence (AI) agents is to let them interact with and assist humans. One example is Dynamic Search (DS), which models the process that a human works with a search engine agent to accomplish a complex and goal-oriented task. Early DS agents using Reinforcement Learning (RL) have only achieved limited success for (1) their lack of direct control over which documents to return and (2) the difficulty to recover from wrong search trajectories. In this paper, we present a novel corpus-level end-to-end exploration (CE3) method to address these issues. In our method, an entire text corpus is compressed into a global low-dimensional representation, which enables the agent to gain access to the full state and action spaces, including the under-explored areas. We also propose a new form of retrieval function, whose linear approximation allows end-to-end manipulation of documents. Experiments on the Text REtrieval Conference (TREC) Dynamic Domain (DD) Track show that CE3 outperforms the state-of-the-art DS systems.

#11 Learning to Interactively Learn and Assist [PDF] [Copy] [Kimi]

Authors: Mark Woodward ; Chelsea Finn ; Karol Hausman

When deploying autonomous agents in the real world, we need effective ways of communicating objectives to them. Traditional skill learning has revolved around reinforcement and imitation learning, each with rigid constraints on the format of information exchanged between the human and the agent. While scalar rewards carry little information, demonstrations require significant effort to provide and may carry more information than is necessary. Furthermore, rewards and demonstrations are often defined and collected before training begins, when the human is most uncertain about what information would help the agent. In contrast, when humans communicate objectives with each other, they make use of a large vocabulary of informative behaviors, including non-verbal communication, and often communicate throughout learning, responding to observed behavior. In this way, humans communicate intent with minimal effort. In this paper, we propose such interactive learning as an alternative to reward or demonstration-driven learning. To accomplish this, we introduce a multi-agent training framework that enables an agent to learn from another agent who knows the current task. Through a series of experiments, we demonstrate the emergence of a variety of interactive learning behaviors, including information-sharing, information-seeking, and question-answering. Most importantly, we find that our approach produces an agent that is capable of learning interactively from a human user, without a set of explicit demonstrations or a reward function, and achieving significantly better performance cooperatively with a human than a human performing the task alone.

#12 CG-GAN: An Interactive Evolutionary GAN-Based Approach for Facial Composite Generation [PDF] [Copy] [Kimi]

Authors: Nicola Zaltron ; Luisa Zurlo ; Sebastian Risi

Facial composites are graphical representations of an eyewitness's memory of a face. Many digital systems are available for the creation of such composites but are either unable to reproduce features unless previously designed or do not allow holistic changes to the image. In this paper, we improve the efficiency of composite creation by removing the reliance on expert knowledge and letting the system learn to represent faces from examples. The novel approach, Composite Generating GAN (CG-GAN), applies generative and evolutionary computation to allow casual users to easily create facial composites. Specifically, CG-GAN utilizes the generator network of a pg-GAN to create high-resolution human faces. Users are provided with several functions to interactively breed and edit faces. CG-GAN offers a novel way of generating and handling static and animated photo-realistic facial composites, with the possibility of combining multiple representations of the same perpetrator, generated by different eyewitnesses.

#13 Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes [PDF] [Copy] [Kimi]

Authors: Shun Zhang ; Edmund Durfee ; Satinder Singh

An autonomous agent acting on behalf of a human user has the potential of causing side-effects that surprise the user in unsafe ways. When the agent cannot formulate a policy with only side-effects it knows are safe, it needs to selectively query the user about whether other useful side-effects are safe. Our goal is an algorithm that queries about as few potential side-effects as possible to find a safe policy, or to prove that none exists. We extend prior work on irreducible infeasible sets to also handle our problem's complication that a constraint to avoid a side-effect cannot be relaxed without user permission. By proving that our objectives are also adaptive submodular, we devise a querying algorithm that we empirically show finds nearly-optimal queries with much less computation than a guaranteed-optimal approach, and outperforms competing approximate approaches.